Visualization of data is a useful tool for better understanding of a data set. By using graphs such as scatter plots or histograms one can create a visual representation of numerical information. This visual representation may help in understanding an abstract data relationship or discovering new insights. This section shows how to create various types of visualizations such as curves and image plots in both R and Python. It also covers some of the important packages/modules needed to create such visualizations.
R has built-in functions for basic plotting needs such as plot(), hist(), points(), and many others. These functions do not require additional packages and can be used directly in an R script or console. For more advanced plotting options one can install and load the ggplot2 package. The ggplot2 package is a system for declaratively creating graphics. It is now over 10 years old and is used by hundreds of thousands of people to make plots. The ggplot2 documentation page [14] ggplot2: Elegant Graphics for Data Analysis serves as a great resource with information on installation and usage, and also provides a helpful cheat sheet. As described by the online reference in [14] "ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details."
Additionally, the R package 'gridExtra' is used in combination with ggplot2. As per the gridExtra documentation page [15] gridExtra: Miscellaneous Functions for "Grid" Graphics, it "provides a number of user-level functions to work with "grid" graphics, notably to arrange multiple grid-based plots on a page, and draw tables." It is simply used here to display two ggplot2 plots side by side. Normally this can be done with the built-in R function par() but the ggplot2 graphics are not compatible with this call.
The packages can be loaded into R using the calls library(ggplot2) and library(gridExtra) if already installed. If not installed one can download a version using the calls install.packages(ggplot2) and install.packages(gridExtra). It should be noted that version 3.3.2 of ggplot2 and version 2.3 of gridExtra are being used.
library(ggplot2)
packageVersion("ggplot2")
[1] ‘3.3.2’
library(gridExtra)
packageVersion("gridExtra")
[1] ‘2.3’
import numpy as np
import matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is very easy to use and has tools for advanced visualization as well as all your basic needs. A warning for English speaking non-Americans: unfortunately Matplotlib does not know what "colour" is and you must use "color" to refer to this parameter.
We will make use of the pyplot and image tools within matplotlib. As described in the matplotlib documentation [16] Matplotlib: A 2D Graphics Environment "pyplot is a state-based interface to matplotlib. It provides an implicit, MATLAB-like, way of plotting. It also opens figures on your screen, and acts as the figure GUI manager. pyplot is mainly intended for interactive plots and simple cases of programmatic plot generation." The image module within matplotlib supports basic image loading, rescaling and display operations. It is used here simply to load a saved jpeg image.
Both of these modules can be imported separately from matplotlib with the calls import matplotlib.pyplot and import matplotlib.image. Here we choose to import these under the names of plt and mpimg respectively. It should be noted that version 3.3.2 of matplotlib is being used.
import matplotlib.pyplot as plt # Plotting
import matplotlib.image as mpimg # Reading images
print(matplotlib.__version__)
3.5.3
Typically scatter plots are used to explore the relationship between two numeric variables by plotting points on a Cartesian coordinate system.
Scatter plots can be created with the scatter() function which includes multiple plotting parameters such as point style, colour, size, and transparency. The Matplotlib documentation page [16] for scatter plots has a few important notes:
Here we plot some 2-dimensional Gaussian $N(0,1)$ data, generated using the NumPy random() function.
data = np.random.normal(size = (2,300)) # Random Gaussian data for plotting
plt.scatter(x = data[0], # x axis
y = data[1], # y axis
marker = 'o', # default point style is a circle 'o', many options are
# availible 'o':cricle, 'v':triangle down, '^':triangle up,...
s = 50, # point size, default is (rcParams['lines.markersize'])^2
c = "red", # point colour, default is blue
alpha = 0.5, # alpha shading transparncey of points, default is solid colour
edgecolor = "black", # colour of point edges, default is same as point colour
linewidths = 1) # width of point edges, default is 1.5
plt.xlabel("X ~ N(0,1)") # add x axis label
plt.ylabel("Y ~ N(0,1)") # add y axis label
plt.title("Random Standard Normal") # add plot title
plt.xlim([-4, 4]) # limits of x axis
plt.ylim([-4, 4]) # limits of y axis
plt.show() # to display nicely
It should be noted that some of the plotting parameters can be "vectorized" to give individual data points any desired appearance. One can create 2 separate groups visually by adjusting the plotting parameters in vector form.
# Vectorize plotting parameters into two groups
sizes = [60]*150 + [30]*150
colours = ['red']*150 + ['green']*150
edgecolor = ['blue']*150 + ['black']*150
plt.scatter(x = data[0], # x axis
y = data[1], # y axis
marker = 'o', # point style
s = sizes, # point size vectorized
c = colours, # point colour vectorized
alpha = 0.5, # alpha shading transparncey
edgecolor = edgecolor, # colour of point edges vectorized
linewidths = 1) # width of point edges
plt.xlabel("X ~ N(0,1)") # x axis label
plt.ylabel("Y ~ N(0,1)") # y axis label
plt.title("Random Standard Normal") # plot title
plt.xlim([-4, 4]) # limits of x axis
plt.ylim([-4, 4]) # limits of y axis
plt.show() # display
Instead of vectorizing the plotting parameters one can "stack" the plots on top of each other. One can plot each group of points one at a time each with different plotting parameters. Obviously the order in which one choose to stack the plots will affect the layering of points. Stacking plots allows for a legend to be created since one can assign a label to each plot call.
data_group1 = data[:,0:150] # first group of data
data_group2 = data[:,150:300] # second group of data
plt.scatter(x = data_group1[0], # plot first group
y = data_group1[1],
marker = 'o',
s = 60,
c = "red",
alpha = 0.5,
edgecolor = "blue",
linewidths = 1,
label = "Group 1") # legend label
plt.scatter(x = data_group2[0], # stack plot of second group
y = data_group2[1],
marker = 'o',
c = "green",
edgecolor = "black",
s = 30,
alpha = 0.5,
linewidths = 1,
label = "Group 2") # legend label
plt.xlabel("X ~ N(0,1)") # x axis label
plt.ylabel("Y ~ N(0,1)") # y axis label
plt.title("Random Standard Normal") # plot title
plt.xlim([-4, 4]) # limits of x axis
plt.ylim([-4, 4]) # limits of y axis
plt.legend() # display legend
plt.show() # display
One can observe that the two matplotlib plotting methods produce virtually identical results.
R scatter plots can be produced using the built-in plot() function as long as the pch parameter is specified to display points. One can notice the extreme similarity to the matplotlib functions.
data <- matrix(rnorm(600), nrow = 2) # Random Gaussian data for plotting
plot(data[1,], # x axis data
data[2,], # y axis data
xlab = "X ~ N(0,1)", # x axis label
ylab = "Y ~ N(0,1)", # y axis label
main = "Random Standard Normal", # title
xlim = c(-4,4), # limits of x axis
ylim = c(-4,4), # limtis of y axis
col = "black", # colour of point outline, default is black
pch = 21, # point style, default is non-filled circle
bg = adjustcolor("red", # colour of point fill
alpha.f = 0.5), # point fill transparency
cex = 1.5) # point size
Just as with using matplotlib, in R one can easily vectorize the parameters to distinguish between different groups in the data.
# Vectorize plotting parameters into two groups
colours <- adjustcolor(c(rep("red", 150), rep("green", 150)), alpha.f = 0.5)
edge_colours <- c(rep("blue", 150), rep("black", 150))
sizes <- c(rep(1.5, 150), rep(1, 150))
plot(data[1,],
data[2,],
xlab = "X ~ N(0,1)",
ylab = "Y ~ N(0,1)",
main = "Random Standard Normal",
xlim = c(-4,4),
ylim = c(-4,4),
col = edge_colours, # vectorized point edge colours, two groups
pch = 21,
bg = colours, # vectorized point colours, two groups
cex = sizes) # vectorized point sizes, two groups
# display legend
legend("topright", # legend location
legend = c("Group 1", "Group 2"), # legend text
col = c("blue", "black"), # legend point edge colour
pch = 21, # legend point style
pt.cex = c(1.5,1), # legend point size
pt.bg = adjustcolor(c("red", "green"), # legend point colour
alpha.f = 0.5))
The alternative to vectorizing parameters is to 'stack' plots on top of each other. This is done in R by first by calling the plot() function, then the points() function. Both of these calls operate the same but the plot() call will initialize the plotting space. Calling plot() twice will produce two separate scatter plots.
data_group1 <- data[,1:150] # first group of data
data_group2 <- data[,151:300] # second group of data
plot(data_group1[1,], # plot first group
data_group1[2,],
xlab = "X ~ N(0,1)",
ylab = "Y ~ N(0,1)",
main = "Random Standard Normal",
xlim = c(-4,4),
ylim = c(-4,4),
col = "blue",
pch = 21,
cex = 1.5,
bg = adjustcolor("red",
alpha.f = 0.5))
points(data_group2[1,], # stack second group of plotted points
data_group2[2,],
xlab = "X ~ N(0,1)",
ylab = "Y ~ N(0,1)",
main = "Random Standard Normal",
xlim = c(-4,4),
ylim = c(-4,4),
col = "black",
pch = 21,
cex = 1,
bg = adjustcolor("green",
alpha.f = 0.5))
# display legend
legend("topright", # legend location
legend = c("Group 1", "Group 2"), # legend text
col = c("blue", "black"), # legend point edge colour
pch = 21, # legend point style
pt.cex = c(1.5,1), # legend point size
pt.bg = adjustcolor(c("red", "green"), # legend point colour
alpha.f = 0.5))
One can observe that the two R methods produce virtually identical results.
Histograms are great tools to approximate the distribution of numerical data. Typically they are created to represent frequency or "counts" of data which fall into specified ranges or they represent this information as a proportion or percentage.
Histograms can be created with the hist() function which also includes multiple plotting parameters. The type of histogram to draw is specified by the histtype parameter; bar is a traditional bar-type histogram. If multiple data are given the bars are arranged side by side. barstacked is a bar-type histogram where multiple data are stacked on top of each other. step generates a line plot that is by default unfilled and stepfilled generates a line plot that is by default filled. The hist() documentation page for matplotlib [16] also notes that for large numbers of bins (>1000), step and stepfilled can be significantly faster than bar and barstacked.
# First Plot
plt.subplot(1, 3, 1) # plot positon
plt.hist(data[0], # data
density = False, # displays frequencies instead of density, default is freq.
color = "Red", # box colour, default is blue
alpha = 0.5, # box transparency
orientation = "vertical", # plot orientation, default is vertical
bins = 10) # histogram bins, can be integer or sequence, default is 10
plt.title("X axis Data") # histogram title
plt.ylabel("Frequency") # y axis label
# Second plot using stacking
plt.subplot(1, 3, 2)
plt.hist(data[0],
color = "Red",
alpha = 0.5,
label = "X") # label for legend
plt.hist(data[1],
alpha = 0.5,
label = "Y", # label for legend
color = "blue")
plt.title("Layered")
plt.legend()
# Last plot
plt.subplot(1, 3, 3)
plt.hist(data[1],
alpha = 0.5,
color = "blue")
plt.title("Y axis Data")
plt.show() # display
The hist() function also allows for multi dimensional data to be stacked or grouped by setting the parameter stacked to stacked=True or stacked=False.
# first plot
plt.subplot(1, 2, 1)
plt.hist(data.T, # data
color = ["red", "blue"], # colors for each dimenison
alpha = 0.5, # transparency
label = ["X", "Y"], # labels for each dimension
stacked = False) # separated and not stacked
plt.legend()
plt.ylim([0,140]) # so plots share same scale
plt.title("Separated")
plt.ylabel("Frequency")
# second plot
plt.subplot(1, 2, 2)
plt.hist(data.T, # data
color = ["red", "blue"], # colors for each dimenison
alpha = 0.5, # transparency
label = ["X", "Y"], # labels for each dimension
stacked = True) # stacked not separated
plt.legend()
plt.title("Stacked")
# display
plt.show()
R has a built-in hist() function for displaying histograms. It should be noted that R's default is to plot the counts in the cells defined by equally spaced breaks. Thus the height of a rectangle is proportional to the number of points falling into the cell, as is the area, provided the breaks are equally-spaced. The default with non-equally spaced breaks is to give a plot of area one, in which the area of the rectangles is the fraction of the data points falling in the cells.
# set up plot layout
par(mfrow = c(1,3), # three plots side by side
mar = c(5.1, 4, 4.1, 0.5)) # spacing between plots
# first plot
hist(data[1,],
col = adjustcolor("red", 0.5), # box colour
breaks = 12, # bin spacing, can be integer or sequence
freq = TRUE, # frequencies and not density
main = "X axis Data", # title
xlab ="") # remove x axis label
# second plot
hist(data[1,],
col = adjustcolor("red", 0.5),
breaks = 12,
freq = TRUE,
main = "Layerd",
xlab ="",
ylab = "")
# layer over top
hist(data[2,],
col = adjustcolor("blue", 0.5),
breaks = 12,
freq = TRUE,
xlab = "",
ylab = "",
add = T) # so plotting knows this is to be overlayed
# add legend to seccond plot
legend("topleft",legend = c("X", "Y"),
col = adjustcolor(c("red", "blue"), 0.5),
pch = 15)
# last plot
hist(data[2,],
col = adjustcolor("blue", 0.5),
breaks = 12,
freq = TRUE,
main = "Y axis Data",
xlab = "",
ylab = "")
For multi-dimensional data the ggplot2 geom_bar() and geom_histograms() calls are used to create the stacked and grouped histograms.
# move data into data frame object
df <- data.frame(d = c(data[1,], data[2,]),
group = c(rep("X", 300), rep("Y", 300)))
df2 <- data.frame(d = c(data[1,], data[2,], data[2,]),
group = c(rep("X", 600), rep("Y", 300)))
# first plot (separated)
p <- ggplot(df, aes(d, fill = group)) + # define plotting space and groups
geom_histogram(position = "dodge", bins = 10) + # define barplot and style
guides(fill = guide_legend(title = NULL)) + # remove legend title
ylim(0, 120) + # set y axis limits
theme_classic() + # set plotting theme, default background is grey grid
xlab("") + # remove x axis label
ylab("Frequency") + # edit y axis label
ggtitle("Separated") + # edit plot title
theme(legend.position = c(0.8, 0.8)) + # move postion of legend
theme(plot.title = element_text(hjust = 0.5)) # move title to center of plot, default is left
# second plot
p2 <- ggplot(df2, aes(x = d, fill = group)) + # define plotting space and groups
geom_histogram(bins = 13, position="identity") + # define histogram bins and style
theme_classic() +
guides(fill=guide_legend(title=NULL)) +
xlab("") +
ylab("") + # remove y axis lable
ggtitle("Stacked") + # edit title
theme(legend.position = c(0.8, 0.8)) +
theme(plot.title = element_text(hjust = 0.5))
# display plots side by side
grid.arrange(p, p2, ncol = 2)
Plotting lines or curves is one of the easiest visualization concepts to understand. It is simply plotting sequences of one dimensional data arrays, similar to a scatter plot or plotting points, but the points are connected in order.
Curves can be plotted with Matplotlib function plot(). Many plotting arguments can be used inside such as the line style, line width, and colour.
# data
domain = np.linspace(-10, 10, 1000) # x axis domain
function1 = np.sinc(domain) # sinc function (sin(pi*x)/pi*x)
function2 = np.cos(domain) # cosine function (cos(x))
function3 = domain ** 2 # quadratic function
function4 = np.exp(domain) # exponential function
# Stack 4 curves on same plot
plt.plot(domain, function1, # data
label = "sinc(x)", # lable for legend
color = "blue", # colour of curve (line)
linestyle = "-", # line style, many options to choose from, default is solid line "-"
lw = 4) # line width
plt.plot(domain, function2,
label = "cos(x)",
color = "red", # if no color is specified Python will choose a unique one
linestyle = "--",
lw = 1)
plt.plot(domain, function3,
label = "x^2",
color = "green",
linestyle = "-.",
lw = 2)
plt.plot(domain, function4,
label = "exp(x)",
color = "orange",
linestyle = ":",
lw = 3)
# Display plot
plt.ylim([-1.2, 4])
plt.xlim([-11, 11])
plt.legend(loc = "upper right")
plt.xlabel("x")
plt.title("Plotting Curves")
plt.show()
Matplotlib also has a useful function fill_between() which allows one to fill space between curves. This can come in handy when displaying confidence intervals or data with variation.
# 100 trials of 10 N(0,1) samples
trials = np.random.normal(size = (100,10))
avgs = np.mean(trials, axis = 1)
stds = np.std(trials, axis = 1)
plt.fill_between(np.linspace(0,100,100), # domain space (x axis)
avgs-stds, # upper limit of filled area
avgs+stds, # lower limit of filled area
color = "blue", # colour of filled space
alpha = 0.3, # trasnparency of filled space
label = "+- 1 Sample SD") # label for legend
plt.plot(np.linspace(0,100,100),
avgs, # average line
color = "blue",
lw = 2,
label = "Sample Average")
# Display plot
plt.xlabel("Trials")
plt.legend(loc = "upper right")
plt.ylim([-2, 2.8])
plt.title("N(0,1) Samples")
plt.show()
The built-in R functions plot() and points() can be used to display curves by setting the type argument to type = l which is used for lines.
# data
domain <- seq(from = -10, to = 10, length = 1000) # x axis domain
func1 <- sin(pi * domain) / (pi * domain) # sinc function
func2 <- cos(domain) # cosine function
func3 <- domain^2 # quadratic function
func4 <- exp(domain) # eponential function
# Stack 4 curves on same plot
plot(domain, # x axis data
func1, # y axis data
ylim = c(-1, 4), # y axis range
type = "l", # plot type is line, default is points
col = "blue", # colour
lwd = 4, # line width
xlab = "x", # x axis label
ylab = "", # y axis label
main = "Plotting Curves") # title
points(domain,
func2,
type = "l",
col = "red",
lwd = 1,
lty = "dashed") # line type, default is solid
points(domain,
func3,
type = "l",
col = "green",
lwd = 2,
lty = "dotdash")
points(domain,
func4,
type = "l",
col = "orange",
lwd = 3,
lty = "dotted")
# add legend to plot
legend("topright",
legend = c("sinc(x)", "cos(x)", "x^2", "exp(x)"), # legend text
lty = c("solid", "dashed", "dotdash", "dotted"), # legend lines
col = c("blue","red", "green", "orange"), # legend line colours
lwd = c(4,1,2,3), # legend line widths
cex = 0.7) # legend size, default is cex = 1
To replicate the results of the matplotlib function fill_between() in R, the polygon function can be used. The polygon() function will draw a polygons whose two sets of vertices are specified by two vectors, essentially acting the same way as the matplotlib fill_between().
# 100 trials of 10 N(0,1) samples
trials <- matrix(rnorm(100*10), ncol = 10)
avgs <- rowMeans(trials)
sds <- apply(trials, 1, sd)
# plot avergae line
plot(1:100, avgs,
col = "blue",
type = "l",
lwd = 2.5,
ylim = c(-2,3),
ylab = "",
xlab = "Trials",
main = "N(0,1) Samples")
# add +- 1 standard devation to plot
polygon(c(1:100,rev(1:100)), # x axis of shading area
c(avgs + sds, rev(avgs - sds)), # top and bottom of shading area
col = adjustcolor("blue", 0.3)) # shading colour
# legend
legend("topright",
legend = c("Sample Average", "+- 1 Sample SD"),
lty = c("solid", NA), # use NA when not using a vectorized parameter
lwd = c(2.5, NA),
col = c("blue", "black"),
pch = c(NA, 22),
cex = 0.7,
pt.bg = c("blue", adjustcolor("blue", 0.3)),
pt.cex = c(1, 1.5))
Images can be thought of as grid spaces in which each grid point acts as a "pixel" with its own "pixel value" or pixel colour. Arbitrary grid spaces can be defined with functions which define the pixel value. These are known as field plots.
To define a domain for field plots we make use of the NumPy function meshgrid(). This allows creation of a grid space in the (x,y) space and evaluation of functions of the form f(x,y) over this grid space. One can make use of the imshow() function to display the results of an evaluation over this grid space. Many options are available inside the imshow() function which can alter the visualization.
# Data
x,y = np.meshgrid(np.linspace(-3, 3, 50), # grid space on x axis, 50 "blocks" between -3 and 3
np.linspace(-3, 3, 50)) # grid space on y axis, 50 "blocks" between -3 and 3
z = (1 - x/2 + x**5 + y**3) * np.exp(-x**2 - y**2) # evaluation over grid space
# First plot
plt.subplot(1, 2, 1)
plt.imshow(z, # data
cmap = "Blues", # colour map, many options availible default is "viridis"
aspect = "auto", # adjust pixel size, default is "equal" with all pixels same square size
origin = "lower", # location of origin [0,0], default is "upper" top left corner
alpha = 0.8) # adjust transparency
plt.title("Non-Sqaure Pixels")
# Second plot
plt.subplot(1, 2, 2)
plt.imshow(z,
cmap = "Oranges", # new colour mapping
aspect = "equal", # all pixels same square size
origin = "lower",
alpha = 0.8)
plt.title("Sqaure Pixels")
plt.show() # display plots nicely
The imshow() function can also be used in a layering technique where one can "stack" or "layer" image plots over one another. Here we must use the alpha parameter which changes the transparency of the displayed images; otherwise, one will see only the top layer, or top image.
Z1 = np.add.outer(range(8), range(8)) % 2 # background image (8x8 sqaures)
edges = np.min(x), np.max(x), np.min(y), np.max(y) # set layered images on same axis
# First image
plt.imshow(Z1,
interpolation = 'nearest', # interpolation method used, default is 'antialiased'
extent = edges, # bounding box that the image will fill, default is based on origin location
alpha = 0.5)
# Second image overlayed
plt.imshow(z,
cmap = 'Oranges',
alpha = 0.9,
interpolation = 'bilinear',
extent = edges)
plt.title("Layered Images")
# display
plt.show()
Within matplotlib's image tools one can make use of the imread() function to import images into memory. Images can be stored in a red-green-blue (RGB) format where pixels are split into three channels of values in the red, green and blue colours.
img = mpimg.imread('map.jpeg')
This imported image has three channels of pixels stored as values in three matrices of size 934 by 1600. These channels can be accessed by indexing the image's third location, the first being the red channel, second being the green, and last being the blue. The channels can then be extracted from the image and individually plotted.
plt.imshow(img) # plot original RGB image
plt.title("Original Image")
plt.axis('off') # removes axis numbers, no "ticks"
plt.show()
red_channel = img[:,:,0] # extract red channel
green_channel = img[:,:,1] # extract green channel
blue_channel = img[:,:,2] # extract blue channel
# original plot
plt.subplot(2,2,1)
plt.imshow(img)
plt.title("Original")
plt.axis('off')
# red channel plot
plt.subplot(2, 2, 2)
plt.imshow(red_channel, cmap = "Reds")
plt.title("Red Channel")
plt.axis('off')
# green channel plot
plt.subplot(2, 2, 3)
plt.imshow(green_channel, cmap = "Greens")
plt.title("Green Channel")
plt.axis('off')
# blue channel plot
plt.subplot(2, 2, 4)
plt.imshow(blue_channel, cmap = "Blues")
plt.title("Blue Channel")
plt.axis('off')
# display
plt.show()
# Data
ticks <- seq(from = -3, to = 3, length = 50) # grids "ticks"
grid <- meshgrid(ticks, y = ticks) # grid with 50 points between -3 and 3 on both axis
X <- grid$X
Y <- grid$Y
z <- (1 - X/2 + X^5 + Y^3) * exp(-X^2 - Y^2) # evaluate on grid space
# Display side by side
par(mfrow = c(1,2),
mar = c(5.1, 2, 4.1, 0.5)) # adjust margins
# First plot
image(z = z,
main = "Non-Square Pixels", # title
col = hcl.colors(50, palette = "Blues", alpha = 0.8), # colour mapping
asp = 2) # chnage pixel size, default is square asp = 1
# Second plot
image(z = z,
main = "Square Pixels",
col = hcl.colors(50, palette = "Oranges", alpha = 0.8)) # new colour mapping
A layering technique is used to "stack" or "layer" image plots over one another. Here the alpha parameters of the layered images are set to produce a blended image.
par(mfrow = c(1,1)) # display one plot
# first image
checkers <- matrix(rep(c(rep(c(0,1),4), rep(c(1,0), 4)), 4), ncol = 8) # "checkers"
image(z = checkers,
main = "Layered Images",
col = hcl.colors(50, palette = "Oranges", alpha = 0.5),
xlim = c(0,1), # limits of image x axis
ylim = c(0,1)) # limits of image y axis
# second image
image(z = z,
col = hcl.colors(50, palette = "Oranges", alpha = 0.8),
add = T) # add image over top
For tasks such as loading images into R we make use of the jpeg package. As per the jpeg documentation page [17] jpeg: Read and Write JPEG images "This package provides an easy and simple way to read, write and display bitmap images stored in the JPEG format. It can read and write both files and in-memory raw vectors." We use the rasterImage() function to draw a raster image at specified locations and sizes and the readJPEG() function to read in the jpeg image from memory.
Just as with all R packages, it can be loaded with the call library(jpeg) and installed if needed with the call install.packages(jpeg). It should be noted that version 0.1.8.1 of jpeg is being used.
library(jpeg) # For reading images into memory
img <- readJPEG("map.jpeg") # get image
packageVersion("jpeg")
[1] ‘0.1.8.1’
The original image is displayed and one can extract its red, blue and green channels.
# plot original RGB image
plot.new()
rasterImage(img, # data
xleft = 0, # plotting regions
xright = 1,
ytop = 1,
ybottom = 0,
interpolate = FALSE) # no interpolation when drawing
title(paste0("Original Image")) # add title to image
# Display 4 plots
par(mfrow = c(2,2), mar = c(1,1,1,1))
# original
plot.new()
rasterImage(img,
xleft = 0,
xright = 1,
ytop = 1,
ybottom = 0,
interpolate = FALSE)
title(paste0("Original Image"))
# Get RBG channels
r <- img[,,1]
g <- img[,,2]
b <- img[,,3]
# Red channel
image(t(r[nrow(r):1,]),
col = hcl.colors(50, palette = "Reds", alpha = 0.5, rev = T),
main = "Red Channel",
xaxt = "n", # remove axis
yaxt = "n")
# Green channel
image(t(g[nrow(g):1,]),
col = hcl.colors(50, palette = "Greens", alpha = 0.5, rev = T),
main = "Green Channel",
xaxt = "n",
yaxt = "n")
# blue channel
image(t(b[nrow(b):1,]),
col = hcl.colors(50, palette = "Blues", alpha = 0.5, rev = T),
main = "Blue Channel",
xaxt = "n",
yaxt = "n")
3D visualizations are a direct extension from 2D visualizations with the addition of an additional axis. Because we live in a three dimensional reality, these types of plots may better represent physical or real world spatial data.
One can make use of the axes(), plot3D() and scatter3D() tools within matplotlib's pyplot toolbox. It is very easy to use and similar to 2D plotting, with the addition of the third z axis.
# lay out 3d plotting space
fig = plt.axes(projection='3d')
# create 3D curve (line)
zline = np.linspace(0, 15, 1000)
xline = np.sin(zline)
yline = np.cos(zline)
# plot 3D curve (line)
fig.plot3D(xline, yline, zline, 'black')
# create 3D data points
zdata = 15 * np.random.random(300)
xdata = np.sin(zdata) + 0.1 * np.random.randn(300)
ydata = np.cos(zdata) + 0.1 * np.random.randn(300)
# plot 3D data points over line
fig.scatter3D(xdata, ydata, zdata, c=zdata, cmap = 'plasma')
# display
plt.title("3D Plotting")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
One can also create very nice contour plots with the contour3D() function, wireframe plots with the plot_wireframe() function, and surface plots with the plot_surface() function. Other options exist within the pylot toolbox and can be used in a very similar way.
# data
x = np.linspace(-6, 6, 30)
y = np.linspace(-6, 6, 30)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X ** 2 + Y ** 2))
# contours
fig = plt.axes(projection='3d')
fig.contour3D(X, Y, Z,
50,
cmap='plasma')
# display
plt.title("3D Contours")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
# wireframe plot
fig = plt.axes(projection='3d')
fig.plot_wireframe(X, Y, Z,
color='black',
alpha = 0.8,
linewidth=0.3)
# display
plt.title("3D Wireframe")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
# surface plot
fig2 = plt.axes(projection='3d')
s = fig2.plot_surface(X, Y, Z,
rstride=1,
cstride=1,
cmap='plasma',
edgecolor='black',
linewidth=0.2)
# Display
plt.title("3D Surface Plot")
plt.xlabel("X")
plt.ylabel("Y")
plt.colorbar(s, shrink = 0.6, aspect = 5) # add colour bar if desired
plt.show()
As per the plot3D documentation page [18] plot3D: Plotting Multi-Dimensional Data the package "provides many functions for viewing 2-D and 3-D data, including perspective plots, slice plots, surface plots and scatter plots." It should be noted that many of the functions are extensions of R’s persp() or image() function. Here we use the functions lines3D, scatter3D and contour3D from the plot3D package.
The package can be loaded into R using the call library(plot3D) if already installed. If not installed one can download a version using the call install.packages(plot3D). It should be noted that version 1.3 of plot3D is being used.
library(plot3D) # 3D plotting package
packageVersion("plot3D")
[1] ‘1.3’
The scatter3D() function is used to create a scatter of points in 3D while the lines3D() function is used here to draw the 3D line.
colours <- hcl.colors(100,palette = "plasma") # plasma colour scheme
# create 3D curve (line)
zline <- seq(0, 15, length.out = 1000)
xline <- sin(zline)
yline <- cos(zline)
# plot 3D curve (line)
lines3D(x = xline, y = yline, z = zline, # data
col = "black", # line colour
theta = 30, # 3D viewing perspective
phi = 30, # 3D viewing perspective
ticktype = 'detailed', # add axis ticks
xlab = "X",
ylab = "Y",
zlab = "Z",
cex.axis = 0.6, # axis numbers font size
main = '3D Plotting')
# create 3D data points
set.seed(10) # reproduce same data
zdata <- 15*runif(300) # z axis
xdata <- sin(zdata) + 0.05*rnorm(300) # x axis
ydata <- cos(zdata) + 0.05*rnorm(300) # y axis
# plot 3D data points over line
scatter3D(x = xdata, y = ydata, z = zdata, # data
colkey = FALSE, # remove colour bar
theta = 30,
phi = 30,
ticktype = 'detailed',
xlab = "X",
ylab = "Y",
zlab = "Z",
cex.axis = 0.6,
add = TRUE, # add over existing line plot
pch = 16, # point style (filled circles)
col = adjustcolor(colours, alpha.f = 0.7)) # transparency of points
For a 3D contour plot the contour3D() function is used.
# function to create z space
f <- function(x, y){
sin(sqrt(x^2 + y^2))
}
# data
x <- y <- seq(-6, 6, length = 30) # x and y axis same
z <- outer(x, y, f) # evaluation
# contours
contour3D(x = x, y = y, z = z, # data
colvar = z, # countours based on z axis
nlevels = 80, # number of contour lines
col = colours, # contour colours
theta = 30,
phi = 30,
ticktype = 'detailed',
xlab = "X",
ylab = "Y",
zlab = "Z",
cex.axis = 0.6,
main = "3D Contours",
colkey = FALSE) # remove colour bar
The persp() function is used here to create both a wireframe plot and a surface plot. The only difference is how the color parameter is specified, in the wireframe plot the colour is set to NA while in the surface plot a plasma colour scheme is used. Note that persp() is actually a built-in R function and does not need any external packages in order to be used.
# wireframe plot
persp(x, y, z, # data
main = "3D Wireframe",
theta = 30,
phi = 30,
border = "black", # colour of lines
box = T, # include box around plot
ticktype = 'detailed',
xlab = "X",
ylab = "Y",
zlab = "Z",
col = NA, # no colour between wire lines
lwd = 0.5, # line thickness
cex.axis = 0.6)
# colour tool to fill the surfaces of plot
zfacet <- (z[-1, -1] + z[-1, -ncol(z)] + z[-nrow(z), -1] + z[-nrow(z), -ncol(z)])/4
facetcol <- cut(zfacet, 100)
# surface plot
persp(x, y, z, # data
main = "3D Surface Plot",
theta = 30,
phi = 30,
border = "black",
box = T,
ticktype = 'detailed',
xlab = "X",
ylab = "Y",
zlab = "Z",
col = colours[facetcol], # colour between lines based on colour tool
lwd = 0.5,
cex.axis = 0.6)
[10] Hans W. Borchers, 2022, Package 'pracma' (Practical Numerical Math Functions), [link]
[14] H. Wickham, 2016. ggplot2: Elegant Graphics for Data Analysis, [link]
[15] Baptiste Auguie, Anton Antonov, 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics, [link]
[16] J. D. Hunter, 2007. Matplotlib: A 2D Graphics Environment, [link]
[17] Simon Urbanek, 2019. jpeg: Read and Write JPEG images, [link]
[18] Karline Soetaert, 2019. plot3D: Plotting Multi-Dimensional Data, [link]